NASA Risk and Safety Culture: 


Minimizing the Risk of Catastrophe by Bringing the Lessons of Space Home 


NASA is challenged with integrating cutting-edge risk management practices and techniques throughout 
a large organization with an extraordinarily diverse mission. This presentation details how past mission 
and institutional failures helped NASA establish a safety culture and a uniform risk management 
environment. This presentation will also explore: 


e How the loss prevention industry has influenced NASA’s approach to sustaining its unique, yet 
aging, infrastructure. 


e How the lessons learned at NASA can augment and enhance the strong operational risk 
management capabilities that already exist within hazardous industries. 


e How NASA’s safety capabilities have been specifically developed to minimize the risk of loss of 
life and loss of mission 


e Moving beyond the use of historical data - specifically: 


oO Proactive and predictive: designed to give you the information you need when you need 
it, not when it’s too late. 


o Designed to flag warnings early: Before catastrophic events occur. 


o Demonstrated: These capabilities have been used at NASA for years. 
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Words of Wisdom ~ 


“Tt can only be attributable to human error.” 
-- HAL 9000 (2001: A Space Odyssey) 
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¢ NASA’s Losses in Space and on the Ground 


— Failure is not an option we choose, but it is a reality we must face.... 


¢ The Impact of Human Factors on Mishaps 
¢ Human Error Integrated in Risk Assessment 


— Acknowledging human frailty and modeling error probabilities. 
¢ NASA’s Safety Culture — Minimizing the Risk 
Environment 
— Reducing error by cultivating skill-based behavior. 


— Bolstering trust throughout operations. 
— Measuring safety culture growth. 


NOAA N-Prime, 

September 6, 

2003: 

¢ $135 Million 
vehicle damage; 

¢ 5.5 year mission 

impact. 


Columbia STS-107, February 1, 2003: 
¢ 7 fatalities; 

* $3 Billion vehicle loss; 

¢ 2.5 year mission impact. 


OCO, February 24, 2009: 
* $280 Million vehicle loss; 
¢ 5+ year mission impact. 


Glory, March 4, 
2011: 


* $424 Million 
vehicle loss; 
; ° ??? mission 
: ahi “"3 DART, April 16, 2005: ects 
Genesis, September 8, 2004: * Proximity operations “+ 
* Some sample retrieval materials lost. mission objectives 


lost. hava 


NASA’s Losses 


Recent Institutional Mishaps 


KSC Roofing Fatality, 

March 17, 2006 

¢ Subcontractor died 
from head injuries 
suffered due to fall. 


MSFC Freedom Star Tow-wire Injury, December 12, 2006 


* Hospitalization due to internal injuries from impact with SRB 
tow-wire. 


Jsc Chamber 8 aioe 2. ee 

Asphyxiation, atmo! e , 10’ above floor 

July 28, 2010 y e WFF CNC Injury, 

¢ Shoulder / =~ October 28, 2010 
injury due to : = ¢ Sub-dermal 
asphyxiation  \igAoitzen monitor . tissue damage 
and fall. | peer due to impact 


from machine 
tool shrapnel. 


What is the impact of Human Factors? 


¢ Estimates range from 65-90% of catastrophic mishaps are due 
to human error. 
— NASA's human factors-related mishaps causes are estimated at ~75% 


¢ As much as we'd like to error-proof our work environment, 
even the most automated and complex technical endeavors 
require human interaction...and are vulnerable to human 
frailty. 


¢ Industry and government are focusing not only on human 
factors integration into hazardous work environments, but 
also looking for practical approaches to cultivating a strong 
Safety Culture that diminishes risk. 


Some Risk Measurement Philosophy... . : - bs 


As much as we’d like to be able to predict error, the reality is that we must 
measure known performance characteristics to identify vulnerabilities, 
mitigate greatest risk, and enable prudent response to the next accident. 
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High Risk Occupations vs. Space Flight 
Person-Fatality Risk Per Year rs 

Truck Driver || 1:3790 


ie 
Timber Cutting and Logging 1:998 
Airline Pilot 1:1270 ‘ : . - > - 
Alaskan Commuter Pilot 1:336 Ry E. _—_— ** 
Construction Worker || 1:4190 | q wn he 
Extracti —, . 
wh TOU A 1:4420 Miner risk does not include fatalities due to chronic 


Mining, Oil and Gas illnesses like “black lung.” 


Commercial Fishing 1:851 
Risk increases as “drill down” into smaller and 


Alaskan Commercial Fishing |) 1:775 smaller groups that drive the risk. 
Northeast Multispecies 1:166 
Groundfish Fishing Shuttle Astronaut risk is a very small group that 
Shuttle Astronaut | 1:218 eas high risk. 


Mt. Everest Climber 


0 1:100 1:50 1:33 
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NASA's Risk Assessment Concepts & Requirements 


Risk Informed Decision-Making Risk Management 
(RIDM) * in volves: RIDM within an Organizational Unit 


Identification of Alternatives = 


(1) Identification of decision alternatives, 
eee 


recognizing opportunities where they TS: ANTE 
arise, and considering a sufficient tradi, } 
number and diversity of performance ee sean ecuwn ona || Rttinormin Alternative Selection 
measures to constitute a 7 ~teaabbinesis 

comprehensive set for decision-making 

purposes. ty “lene Seeded aenatre 


(2) Risk analysis of decision alternatives 
to support ranking. 


Baseline Performance Requirements 


1_--—— CRM Feeaback to RIOM 


Communicate 
Document 


(3) Selection of a decision alternative 
informed by (not solely based on) risk 
analysis results. 


Elevate Decision One Level Up if Needed 
and 


Report Top Risks One Level Up 


* NPR 8000.4, Agency Risk Management Procedural Requirements 
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Risk Scorecard 


JSC RISK MATRIX 


vehi Expected to happen. Controls have minimal to no effect. 


Likely to happen. Controls have significant limitations or 
uncertainties. 


High — Mitigate; implement new 
processes, change requirements, 
or re-baseline 


Could happen. Controls exist, with some limitations or 


oe Moderate — Manage/consider 
uncertainties. 


alternative processes, or Accept 
Not expected to happen. Controls have minor limitations 
or uncertainties. 


Low — Manage within normal 
processes; or Close 
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Highly Extremely remote possibility that it will happen. Strong 
Unlikely | controls in place. 


CONSEQUENCE | __ Subcategories, | 


Long-term injury, impairment Permanent injury or 
or incapacitation; incapacitation; 
Significant OSHA violation Major OSHA violation 


Minor injury; Short-term injury; Moderate 


L f lif 
Minor OSHA violation OSHA violation Oss Of lle 


Personnel 


H ths f System, Facilit Minor d t t ica t aa aee Li f non-critical t Damage to a critical asset Loss'o/ erifical asset or 
( Leh ,sa a ystem, Facility inor damage to asse Megradsuiperornmence oss of non-critical asse g Brerieney/ovacuanen 
vironmen 


Significant violation; Event 
requires immediate 
remediation 


Minor or non-reportable Moderate hazard or 
hazard or incident reportable violation 


Major violation; Event causes 
temporary work stoppage 


Environment Catastrophic hazard 


Minor impact to mission : : , ee Noncompliance; Major impact 
Parormance objectives or Incomplete compliance with | Noncompliance; Significant on Center or Spaceflight 


; a key mission objective impact to mission awe 
requirements mission 


Failure to meet mission 


TECHNICAL anee 
objectives 


Significant damage to 
infrastructure or reduced 
support 


Minor i h i h ignifi i ale bf 
Workforce a mpect te hunen Rodale mest feniren Sa want Inpaet, eerie Major impact; Loss of skillset | Loss of Core Competency 
capital capital Critical skill 


Organizational or | <2% Budget increase or | 2-5% Budget increase or 5-10% Budget increase or 10-15% Budget increase or >15% Budget increase or 
CMO Impact <$1M CMO Threat $1M-$5M CMO Threat $5M-10M CMO Threat $10M-$60M CMO Threat >$60M CMO Threat; 


anor nilestone'si Moderate milestone slip; Project milestone slip; No Major milestone slip; Impact to Failure to meet critical 
2 Schedule margin available impact to a critical path a critical path milestones 


Extended loss of critical 
capabilities 


Minor impact or reduced 
effectiveness 


Moderate impact or damage 
to infrastructure 


Mission delays or major 
impacts to Center operations 


Infrastructure 


CENTER 
CAPABILITIES 


SCHEDULE 


Consequence 


Cc 
Title 
(Notional Risk Titiles) 
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ee | A_ Test system maintenance @ Led 
w 
| 4x5 | A Mission essential resource limitations , [Fw | 
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| 4x3 | A Building Refurbishment we 


CONSEQUENCE 
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Legend 
A_ Top Cenier Risk (TCR) 
A. Proposed Top Center Risk (Proposed TCR) 


Probabilistic Risk Assessment (PRA) - | .:- 


¢ PRA integrates models based on 
systems engineering, probability and 
Statistics, reliability and maintainability 
engineering, physical and biological : 
sciences, decision theory, and expert _ : Stas 


tons : future 
Information . $ inference 
° 


O p I n 10 n e a —_ (Via Inference) 


= (aleatory) 
¢ PRA is needed when decisions need to 
be made that involve high stakes in a i om : 
complex situation. : || - Engineering 
¢ The collection of risk scenarios allows [| 
the dominant risk factors to be 
identified, then modified or eliminated | Powe Information 
to improve the probability of success. 2 ——..... 
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HumanReliability Analysis (HRA) Integration 


with Probabilistic Risk Assessment (PRA) 


¢ In the PRA context, HRA is the assessment of the reliability and risk impact of 
the interactions of humans on a system or a function. 


¢ For situations that involve a large number of human-system interactions, HRA 


becomes an important element of PRA to ensure a realistic assessment of the 
risk. 


¢ In general, the Human Reliability Analysis process has a number of distinct 
steps, as shown below: 


ES ESB ce Ee 


Select initiating eldentification of 
activity based on specific human 
consequence motions, behaviors, 
assessment. 


eError options and 
probabilities are 
modeled and can be 
iteratively modified 


eDetermine overall 
probability of 
consequence. 


actions, and 
dependent 
environments. 


based on system 
design, procedures or 
risk control 
adjustments. 


Adapted from NASA/SP-2011-3421, Probabilistic Risk Asse 


ssment Procedures Guide for NASA Managers and Practitioners 
David T. Loyd 


Performance Shaping Factors (PSF) 


¢ PSFs impact human 
performance in a variety of 
ways, such as intelligence, 
expertise, emotion, harsh 
conditions, conflicting orders, 
etc. 


¢ PSFs are incorporated into HRA 
error modeling, accommodating 
anticipated human interaction 
with critical tasking. 


¢ We work to minimize the affects 
of PSFs, but our expectation of 
performance must acknowledge 
their potential impact to 
operations. 


( Environmental y) 


Minimizing-Human Error 


and Cultivating a Reduced Risk Environm At 


Rasmussen’s 3 Human Responses to Operator Information Processing 
1. Skill-based: requires little or no cognitive effort. 


2. Rule-based: driven by procedures or rules. 
3. Knowledge-based: requires problem solving/decision making. 


“The fewer rules a coach has, the fewer 
rules there are for players to break.” 
John Madden 


Trust and Transparency Builds Common Risk rofGrance™ , 


¢ Trust is what drives open reporting. 

¢ Transparent dialog promotes availability of information to 
inform more robust decision-making. 

¢ The result is uniform engagement to optimize success 

potential and accept a common risk tolerance (resilience). 

This environment is the foundation of an effective safety 

culture 


CLOSE CALL 


NONCONEORMANCE 


TRUST LEVEL and CLARITY 


How Safety Culture Promotes Operational Excellence 


¢ By advocating a pervasive Safety Culture, we can 
provide our workforce with: 


— Clear emphasis on continuous learning; 
— Encouragement to develop intuitive personal values; 


— Guidelines for decision-making behavior that focuses on 
long-term success; 


— Reinforcement to build trust by reporting and 
communicating concerns and ideas. 


¢ Practicing an effective Safety Culture: 


— Builds Skill-based and Knowledge-based response 
mechanisms; 


7 - — Reduces the emphasis on Rule-based response; 
— — And breaks down barriers to Trust. 


“An environment characterized by safe attitudes and 
behaviors modeled by leaders and embraced by all that 
fosters an atmosphere of open communication, mutual 
trust, shared safety values and lessons, and confidence 
that we will balance challenges and risks consistent with 
our core value of safety to successfully accomplish our < 
mission.” 


An effective safety culture is characterized by the following (/) 
subcomponents: 


Reporting Culture - We report our concerns 

Just Culture - We have a sense of fairness 

Flexible Culture - We change to meet new demands 
(Learning Culture - We learn from our successes and mistakes 


[Engaged Culture - Everyone does his or her part 


January 25, 2017 
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atastrophic Event Impact 


Apollo 1 — January 27, 1967 


Reporting Culture — Procedures were subjected to 
last-minute changes that were not effectively 
tracked, recorded or communicated. 


Just Culture — Poor morale and process discipline 
were evident in Command Module contractor 
performance prior to the incident. 


— Willingness to change course on 
design issues was weak in the presence of 
compelling important information. 


Learning Culture — Test planning failed to 
appreciate the significant hazards of a 100% 
oxygen environment. 


Engaged Culture — NASA provided insufficient 
surveillance over management functions. 


Apollo 13 — April 13, 1970 


Reporting — Incomplete and sometimes incorrect 
information was used in problem solving. 


Just — Absence of information on this factor attests 
to the general neglect at the time of organizational 
behavior as a key factor in mishaps. 


— Demonstrated ability to adapt quickly to 
an emergency although flexibility prior to the 
mishap is unclear. 


Learning — While safeguards had been implemented 
following the Apollo 1 fire, key aspects of design, 
workmanship, and material use remained 
vulnerable to oxygen flammability. 


Engaged — Solutions immediately following the 
oxygen tank explosion represent an engaged team. 


Catastrophic-Event Impact Ws) NSA 


Using the Safety Culture Model to Analyze NASA's History. - 


Challenger — January 28, 1986 


— Ineffective problem reporting 
requirements and practices. 


Just — Stifled communication regarding O-ring 
susceptibility to cold conditions. 


Flexible — Launch concerns were dismissed in 
the face of significant schedule pressure. 


Learning — Trend analysis was inadequate as 
evidenced by identification of a number of 
burn-through events which occurred prior 
to STS-51L. 


— NASA management lacked 
involvement in critical discussions. 
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Culture Model to Analyze NASA’s Histor 


| 
| 
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Columbia — February 1, 2003 


Reporting — Foam shedding was a known problem, yet foam impact data was still 
being analyzed at the time of the flight, and not considered a serious hazard. 


Just — Some engineers were reluctant to raise concerns when faced with a return of 
an “in God we trust - all others bring data” attitude. 


— Like the Challenger mishap, the Shuttle Program was experiencing 
schedule pressure challenges. 


Learning — With “normalization of deviance,” foam had become classified as “in- 
family” and as a negligible risk to the orbiter. 


Engaged — “Echos” of the Challenger mishap were evident. 


NASA Safety Culture Model Applied to Deepwater Horizon 


Deepwater Horizon — April 20, 2010 E 


— Procedures were subjected to last-minute 
distribution, last minute decision. 


Just — Concerns of rig workers regarding test results 
were muted, not heeded or explored . 


Flexible — All involved seemed prepared to exercise 
flexibility, but this may be indicative of insufficient 
process discipline. 


Learning — Invalid confidence in new slurry, vents from 
Mud-Gas Separator (MGS) allowed gas to enter rig 
spaces, insufficient planning for contingencies. 


— Incorrect reading of pressure tests, lack of 
recognition or timely control action related to kicks, 
diverted flow through MGS instead of overboard, 
reluctance to activate Blow-Out Preventer (BOP), 
reluctance to activate the Emergency Disconnect 
System, BOP testing and maintenance. 


Measuring Safety Culture 


— 2015 Safety Culture Survey Results JSC R1 through R3 Comment Quality 
Very Satisfied/ 6.00 - 
Soong Ae ae — = = Analysis 
=====$ 503 505 = se 5 
Satisfied / Agree 5-00 nin) is nil ie oi 
; oo 
$0 
4 wo oO oa) 
—— 400 — " " 
3 
Slightly Dissatisfied 3.00 a 
/ Disagree a Nn 
2 = a 
Dissatisfied 2.00 2 
Disagree 
1 L 
Very Dissatisfied 1.00 | 
Strongly Disagree ; | 
(0) — 
4 R1 R2 R3 
Dees 5 6 — 9 0 2) B Ww 6b 6 7 8B 9 0 A D 
Reporting Just Flexible Learning Engaged = Comment Quality | — Engagement 


Question Number . - . : f s : 
“Quality” is equivalentto Likert Value associated with received comments. 


Round 1 (2010) Round 2 (2012) “Engagement” is the average number of comments per SCS participant. 


Comment Temperature Perspectives 


HOT TEPID 
“Eliminate the recalcitrant WARM “Watch out for everyone” COOL 
dinosaur dictators” “Communication” “Keep doing what you 
are doing. We are 
constantly being 
reminded of Safety and 
its importance.” 
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“Emphasis on purpose of 
safety measures, not just 
filling out a form or 
checking a box.” 


The Path to Operational Excellence - 


NASA, like the other hazardous industries, 
has suffered very catastrophic losses. 


Human error will likely never be completely 


eliminated as a factor in our failures. 


Acknowledging human frailty and the 
potential for failure bolsters our ability to 
manage risks and mitigate the worst 
consequences. 

Building an effective Safety Culture bolsters 
skill-based performance that minimizes risk 
and encourages operational excellence. 


Backup Charts 
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Columbia STS-107, February 1, 2003: 
7 fatalities; 
S3 Billion vehicle loss; 
[s 2.5 year mission impact. 
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NOAA N-Prime, September 6, 2003: & 
¢ $135 Million vehicle damage; 
*5.5 year mission impact. 


Genesis, September 8, 2004: 
* Some sample retrieval materials lost. 


Glory, March 4, 2011: 
¢ S424 Million vehicle loss; 
° 2??? mission impact. 


Orbiting Carbon Observatory, 
February 24, 2009: 

* $280 Million vehicle loss; 
°5+ year mission impact. 


JSC Chamber B Asphyxiation, 

July 28, 2010 

¢ Shoulder injury due to 
asphyxiation and fall. 
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